74 research outputs found
A Nutritional Label for Rankings
Algorithmic decisions often result in scoring and ranking individuals to
determine credit worthiness, qualifications for college admissions and
employment, and compatibility as dating partners. While automatic and seemingly
objective, ranking algorithms can discriminate against individuals and
protected groups, and exhibit low diversity. Furthermore, ranked results are
often unstable --- small changes in the input data or in the ranking
methodology may lead to drastic changes in the output, making the result
uninformative and easy to manipulate. Similar concerns apply in cases where
items other than individuals are ranked, including colleges, academic
departments, or products.
In this demonstration we present Ranking Facts, a Web-based application that
generates a "nutritional label" for rankings. Ranking Facts is made up of a
collection of visual widgets that implement our latest research results on
fairness, stability, and transparency for rankings, and that communicate
details of the ranking methodology, or of the output, to the end user. We will
showcase Ranking Facts on real datasets from different domains, including
college rankings, criminal risk assessment, and financial services.Comment: 4 pages, SIGMOD demo, 3 figuress, ACM SIGMOD 201
Appearance frequency modulated gene set enrichment testing
Abstract
Background
Gene set enrichment testing has helped bridge the gap from an individual gene to a systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, current methods for gene set enrichment testing treat all genes equal. It is well-known that some genes, such as those responsible for housekeeping functions, appear in many pathways, whereas other genes are more specialized and play a unique role in a single pathway. Drawing inspiration from the field of information retrieval, we have developed and present here an approach to incorporate gene appearance frequency (in KEGG pathways) into two current methods, Gene Set Enrichment Analysis (GSEA) and logistic regression-based LRpath framework, to generate more reproducible and biologically meaningful results.
Results
Two breast cancer microarray datasets were analyzed to identify gene sets differentially expressed between histological grade 1 and 3 breast cancer. The correlation of Normalized Enrichment Scores (NES) between gene sets, generated by the original GSEA and GSEA with the appearance frequency of genes incorporated (GSEA-AF), was compared. GSEA-AF resulted in higher correlation between experiments and more overlapping top gene sets. Several cancer related gene sets achieved higher NES in GSEA-AF as well. The same datasets were also analyzed by LRpath and LRpath with the appearance frequency of genes incorporated (LRpath-AF). Two well-studied lung cancer datasets were also analyzed in the same manner to demonstrate the validity of the method, and similar results were obtained.
Conclusions
We introduce an alternative way to integrate KEGG PATHWAY information into gene set enrichment testing. The performance of GSEA and LRpath can be enhanced with the integration of appearance frequency of genes. We conclude that, generally, gene set analysis methods with the integration of information from KEGG PATHWAY performs better both statistically and biologically.http://deepblue.lib.umich.edu/bitstream/2027.42/112430/1/12859_2010_Article_4457.pd
Network analysis of genes regulated in renal diseases: implications for a molecular-based classification
Abstract
Background
Chronic renal diseases are currently classified based on morphological similarities such as whether they produce predominantly inflammatory or non-inflammatory responses. However, such classifications do not reliably predict the course of the disease and its response to therapy. In contrast, recent studies in diseases such as breast cancer suggest that a classification which includes molecular information could lead to more accurate diagnoses and prediction of treatment response. This article describes how we extracted gene expression profiles from biopsies of patients with chronic renal diseases, and used network visualizations and associated quantitative measures to rapidly analyze similarities and differences between the diseases.
Results
The analysis revealed three main regularities: (1) Many genes associated with a single disease, and fewer genes associated with many diseases. (2) Unexpected combinations of renal diseases that share relatively large numbers of genes. (3) Uniform concordance in the regulation of all genes in the network.
Conclusion
The overall results suggest the need to define a molecular-based classification of renal diseases, in addition to hypotheses for the unexpected patterns of shared genes and the uniformity in gene concordance. Furthermore, the results demonstrate the utility of network analyses to rapidly understand complex relationships between diseases and regulated genes.http://deepblue.lib.umich.edu/bitstream/2027.42/112463/1/12859_2009_Article_3354.pd
D-SPACE4Cloud: A Design Tool for Big Data Applications
The last years have seen a steep rise in data generation worldwide, with the
development and widespread adoption of several software projects targeting the
Big Data paradigm. Many companies currently engage in Big Data analytics as
part of their core business activities, nonetheless there are no tools and
techniques to support the design of the underlying hardware configuration
backing such systems. In particular, the focus in this report is set on Cloud
deployed clusters, which represent a cost-effective alternative to on premises
installations. We propose a novel tool implementing a battery of optimization
and prediction techniques integrated so as to efficiently assess several
alternative resource configurations, in order to determine the minimum cost
cluster deployment satisfying QoS constraints. Further, the experimental
campaign conducted on real systems shows the validity and relevance of the
proposed method
The network structure of visited locations according to geotagged social media photos
Businesses, tourism attractions, public transportation hubs and other points
of interest are not isolated but part of a collaborative system. Making such
collaborative network surface is not always an easy task. The existence of
data-rich environments can assist in the reconstruction of collaborative
networks. They shed light into how their members operate and reveal a potential
for value creation via collaborative approaches. Social media data are an
example of a means to accomplish this task. In this paper, we reconstruct a
network of tourist locations using fine-grained data from Flickr, an online
community for photo sharing. We have used a publicly available set of Flickr
data provided by Yahoo! Labs. To analyse the complex structure of tourism
systems, we have reconstructed a network of visited locations in Europe,
resulting in around 180,000 vertices and over 32 million edges. An analysis of
the resulting network properties reveals its complex structure.Comment: 8 pages, 3 figure
Modeling performance of Hadoop applications: A journey from queueing networks to stochastic well formed nets
Nowadays, many enterprises commit to the extraction of actionable knowledge from huge datasets as part of their core business activities. Applications belong to very different domains such as fraud detection or one-to-one marketing, and encompass business analytics and support to decision making in both private and public sectors. In these scenarios, a central place is held by the MapReduce framework and in particular its open source implementation, Apache Hadoop. In such environments, new challenges arise in the area of jobs performance prediction, with the needs to provide Service Level Agreement guarantees to the enduser and to avoid waste of computational resources. In this paper we provide performance analysis models to estimate MapReduce job execution times in Hadoop clusters governed by the YARN Capacity Scheduler. We propose models of increasing complexity and accuracy, ranging from queueing networks to stochastic well formed nets, able to estimate job performance under a number of scenarios of interest, including also unreliable resources. The accuracy of our models is evaluated by considering the TPC-DS industry benchmark running experiments on Amazon EC2 and the CINECA Italian supercomputing center. The results have shown that the average accuracy we can achieve is in the range 9–14%
Algorithms and Bounds for Drawing Directed Graphs
In this paper we present a new approach to visualize directed graphs and
their hierarchies that completely departs from the classical four-phase
framework of Sugiyama and computes readable hierarchical visualizations that
contain the complete reachability information of a graph. Additionally, our
approach has the advantage that only the necessary edges are drawn in the
drawing, thus reducing the visual complexity of the resulting drawing.
Furthermore, most problems involved in our framework require only polynomial
time. Our framework offers a suite of solutions depending upon the
requirements, and it consists of only two steps: (a) the cycle removal step (if
the graph contains cycles) and (b) the channel decomposition and hierarchical
drawing step. Our framework does not introduce any dummy vertices and it keeps
the vertices of a channel vertically aligned. The time complexity of the main
drawing algorithms of our framework is , where is the number of
channels, typically much smaller than (the number of vertices).Comment: Appears in the Proceedings of the 26th International Symposium on
Graph Drawing and Network Visualization (GD 2018
Understanding the adoption of business analytics and intelligence
Cruz-Jesus, F., Oliveira, T., & Naranjo, M. (2018). Understanding the adoption of business analytics and intelligence. In Á. Rocha, H. Adeli, L. P. Reis, & S. Costanzo (Eds.), Trends and Advances in Information Systems and Technologies, pp. 1094-1103. (Advances in Intelligent Systems and Computing; Vol. 745). Springer Verlag. DOI: 10.1007/978-3-319-77703-0_106Our work addresses the factors that influence the adoption of business analytics and intelligence (BAI) among firms. Grounded on some of the most prominent adoption models for technological innovations, we developed a conceptual model especially suited for BAI. Based on this we propose an instrument in which relevant hypotheses will be derived and tested by means of statistical analysis. We hope that the findings derived from our analysis may offer important insights for practitioners and researchers regarding the drivers that lead to BAI adoption in firms. Although other studies have already focused on the adoption of technological innovations by firms, research on BAI is scarce, hence the relevancy of our research.authorsversionpublishe
- …